Some questions about the RPC spec

215 views

Skip to first unread message

Thomas Leonard

unread,

Jul 19, 2017, 4:46:46 PM7/19/17

to Cap'n Proto

Hi,

I'm trying to write an implementation of the RPC spec (level 1, in OCaml). I found a few parts of the spec unclear - could someone clarify them for me?

It says:

[ExportId]
> The exporter chooses an ID before sending a capability over the wire. If
> the capability is already in the table, the exporter should reuse the same ID.

But later:

[CapDescriptor]
> senderHosted @1 :ExportId;
> A capability newly exported by the sender. This is the ID of the new capability in the
> sender's export table (receiver's import table).

How can the exporter reuse the same ID, if it has to be newly exported?

[Message]
> This could be e.g. because the sender received an invalid or nonsensical
> message (`isCallersFault` is true) or because the sender had an internal error
> (`isCallersFault` is false).

isCallersFault appears to be deprecated (`obsoleteIsCallersFault` appears much later).

[Call.sendResultsTo]
> When `yourself` is used, the receiver must still send a `Return` for the call, but sets the
> field `resultsSentElsewhere` in that `Return` rather than including the results.

When should `resultsSentElsewhere` be returned? Once the result is known? Or
once the first takeFromOtherQuestion collects it?

Can takeFromOtherQuestion be used more than once for a single source question?

> The `Call` for bar'() has `sendResultsTo` set to `yourself`, with the value being the
> question ID originally assigned to the bar() call.

What does "the value" refer to here? `yourself` has type `Void`.

> Vat B receives the `Return` for bar'() and sends a `Return` for bar(), with
> `receivedFromYourself` set in place of the results.

`receivedFromYourself` does not appear anywhere else in the spec.

[Return.releaseParamCaps]
> If true, all capabilities that were in the params should be considered released.

Just to be sure: as if the sender had sent a release message for each one with `count=1`?

[Payload]
Why is it not possible to send exceptions in payloads? Should I export each
broken capability as an export and then immediately send a Resolve for each
one, resolving it to an exception?

[Resolve]
> When an export ID sent over the wire (e.g. in a `CapDescriptor`) is indicated to be a promise,
> this indicates that the sender will follow up at some point with a `Resolve` message. If the
> same `promiseId` is sent again before `Resolve`, still only one `Resolve` is sent. If the
> same ID is sent again later _after_ a `Resolve`, it can only be because the export's
> reference count hit zero in the meantime and the ID was re-assigned to a new export, therefore
> this later promise does _not_ correspond to the earlier `Resolve`.

It's not clear to me why it is useful for the receiver to know this.
Presumably the sender can't reuse an export ID until the receiver explicitly releases it anyway.
Should an implementation keep track of whether a resolve has arrived yet and behave differently based on this when it sees an export ID?

> The sender promises that from this point forth, until `promiseId` is released, it shall
> simply forward all messages to the capability designated by `cap`.

Does something similar apply to Return messages? Might be worth mentioning it there too.

[Disembargo]
> Embargos are used to enforce E-order in the presence of promise resolution. That is, if an
> application makes two calls foo() and bar() on the same capability reference, in that order,
> the calls should be delivered in the order in which they were made. But if foo() is called
> on a promise, and that promise happens to resolve before bar() is called, then the two calls
> may travel different paths over the network, and thus could arrive in the wrong order. In
> this case, the call to `bar()` must be embargoed, and a `Disembargo` message must be sent along
> the same path as `foo()` to ensure that the `Disembargo` arrives after `foo()`.

What does "this case" refer to? When exactly is an embargo needed, and when not?

> There are two particular cases where embargos are important. Consider object Alice, in Vat A,
> who holds a promise P, pointing towards Vat B, that eventually resolves to Carol.

Could Carol be another promise here? Should Alice wait until the target is fully resolved before doing a disembargo, or do a disembargo for each step?

[Accept]
> This message is also used to pick up a redirected return -- see `Return.redirect`.

`redirect` doesn't appear anyway else in this spec. I guess it's `Return.sendResultsTo.thirdParty`.

[ Network-specific Parameters]
> For interaction over the global internet between parties with no other prior arrangement, a
> particular set of bindings for these types is defined elsewhere. (TODO(someday): Specify where
> these common definitions live.)

Do these definitions exist now?

Thanks!

Ross Light

unread,

Jul 19, 2017, 5:57:28 PM7/19/17

to Thomas Leonard, Cap'n Proto

Replies inline (with the disclaimer that I'm not Kenton, my only credentials are that I have stared at this file for a long time):

On Wed, Jul 19, 2017 at 1:46 PM Thomas Leonard <tal...@gmail.com> wrote:

Hi,

I'm trying to write an implementation of the RPC spec (level 1, in OCaml). I found a few parts of the spec unclear - could someone clarify them for me?

It says:

[ExportId]
> The exporter chooses an ID before sending a capability over the wire. If
> the capability is already in the table, the exporter should reuse the same ID.

But later:

[CapDescriptor]
> senderHosted @1 :ExportId;
> A capability newly exported by the sender. This is the ID of the new capability in the
> sender's export table (receiver's import table).

How can the exporter reuse the same ID, if it has to be newly exported?

That seems like a doc/spec typo. You can always specify an existing capability. I think the wording should be something like: "A capability exported by the sender. This may or may not be a new ID in the sender's export table (receiver's import table)."

[Message]
> This could be e.g. because the sender received an invalid or nonsensical
> message (`isCallersFault` is true) or because the sender had an internal error
> (`isCallersFault` is false).

isCallersFault appears to be deprecated (`obsoleteIsCallersFault` appears much later).

Yup, Exception has changed (IMO for the better). Instead of placing blame on sender or receiver (such distinctions are hard to draw in general), exceptions are now about what action that caller is advised to take based on the failure.

[Call.sendResultsTo]
> When `yourself` is used, the receiver must still send a `Return` for the call, but sets the
> field `resultsSentElsewhere` in that `Return` rather than including the results.

When should `resultsSentElsewhere` be returned? Once the result is known? Or
once the first takeFromOtherQuestion collects it?

(I haven't implemented this for Go yet, but want to.) AFAICT resultsSentElsewhere should be sent once the result is known.

Can takeFromOtherQuestion be used more than once for a single source question?

I would assume that it could be used until Finish message is sent for that question, much like other question-based data. In practice, every call's result is held in the answers table until Finish is received.

> The `Call` for bar'() has `sendResultsTo` set to `yourself`, with the value being the
> question ID originally assigned to the bar() call.

What does "the value" refer to here? `yourself` has type `Void`.

> Vat B receives the `Return` for bar'() and sends a `Return` for bar(), with
> `receivedFromYourself` set in place of the results.

`receivedFromYourself` does not appear anywhere else in the spec.

I think this whole example is stale and probably needs another draft.

[Return.releaseParamCaps]
> If true, all capabilities that were in the params should be considered released.

Just to be sure: as if the sender had sent a release message for each one with `count=1`?

(I might be wrong on this point, it's been a while since I've looked. The docs should probably spell this out.) Usually. The list of CapDescriptors in a Payload could point to the same capability multiple times. A release message of count=1 per CapDescriptor is a more accurate way of phrasing this.

[Payload]
Why is it not possible to send exceptions in payloads? Should I export each
broken capability as an export and then immediately send a Resolve for each
one, resolving it to an exception?

Payload is only used for parameters and results. It doesn't make sense for parameters to be an exception, and results is inside a union where you could specify an exception that is an alternative. I'm not sure I understand the use-case where you are sending a broken capability.

[Resolve]
> When an export ID sent over the wire (e.g. in a `CapDescriptor`) is indicated to be a promise,
> this indicates that the sender will follow up at some point with a `Resolve` message. If the
> same `promiseId` is sent again before `Resolve`, still only one `Resolve` is sent. If the
> same ID is sent again later _after_ a `Resolve`, it can only be because the export's
> reference count hit zero in the meantime and the ID was re-assigned to a new export, therefore
> this later promise does _not_ correspond to the earlier `Resolve`.

It's not clear to me why it is useful for the receiver to know this.
Presumably the sender can't reuse an export ID until the receiver explicitly releases it anyway.
Should an implementation keep track of whether a resolve has arrived yet and behave differently based on this when it sees an export ID?

It's more specifying that the receiver should not resolve the promise more than once. I believe in this case that it would be a protocol violation, in which case the correct behavior would be for the receiver to send an abort.

> The sender promises that from this point forth, until `promiseId` is released, it shall
> simply forward all messages to the capability designated by `cap`.

Does something similar apply to Return messages? Might be worth mentioning it there too.

I believe so, but I don't know/remember. :(

[Disembargo]
> Embargos are used to enforce E-order in the presence of promise resolution. That is, if an
> application makes two calls foo() and bar() on the same capability reference, in that order,
> the calls should be delivered in the order in which they were made. But if foo() is called
> on a promise, and that promise happens to resolve before bar() is called, then the two calls
> may travel different paths over the network, and thus could arrive in the wrong order. In
> this case, the call to `bar()` must be embargoed, and a `Disembargo` message must be sent along
> the same path as `foo()` to ensure that the `Disembargo` arrives after `foo()`.

What does "this case" refer to? When exactly is an embargo needed, and when not?

If you're implementing level 1 (two-party), then really the only place where this applies is when you receive a capability that the receiver hosts as part of a return or resolve after you have made calls on the promised capability. This implies that the RPC system needs to keep track of which parts of the answer have had calls made on them. When this occurs, the receiver gives the application code an embargoed client, and then sends a Disembargo with senderLoopback set. It releases the embargo once the same disembargo ID is returned with receiverLoopback set.

For me, this was the hardest part of the spec to understand. I understand why it's needed, but it's really hard to grok the implications.

> There are two particular cases where embargos are important. Consider object Alice, in Vat A,
> who holds a promise P, pointing towards Vat B, that eventually resolves to Carol.

Could Carol be another promise here? Should Alice wait until the target is fully resolved before doing a disembargo, or do a disembargo for each step?

See above explanation. But no, Carol cannot be a promise, since the only time that an embargo is triggered is once you get back a locally hosted capability.

[Accept]
> This message is also used to pick up a redirected return -- see `Return.redirect`.

`redirect` doesn't appear anyway else in this spec. I guess it's `Return.sendResultsTo.thirdParty`.

Probably. It's Level 3, so it's invisible to me. :D

[ Network-specific Parameters]
> For interaction over the global internet between parties with no other prior arrangement, a
> particular set of bindings for these types is defined elsewhere. (TODO(someday): Specify where
> these common definitions live.)

Do these definitions exist now?

¯\_(ツ)_/¯

Thanks!

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

Daniel Sank

unread,

Jul 19, 2017, 6:56:12 PM7/19/17

to Ross Light, Thomas Leonard, Cap'n Proto

Allow me to state the obvious: whatever comes out of this discussion should probably result in an update of the spec file. Searching mailing list history is great but... oh wait never mind searching mailing list history is terrible :-)

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+unsubscribe@googlegroups.com.

Visit this group at https://groups.google.com/group/capnproto.

--

You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+unsubscribe@googlegroups.com.

Visit this group at https://groups.google.com/group/capnproto.

Daniel Sank

Ross Light

unread,

Jul 19, 2017, 9:00:08 PM7/19/17

to Daniel Sank, Thomas Leonard, Cap'n Proto

Good point. I probably should have cut to the chase and sent my response as a PR. I'll do that as soon as I'm back at a non-phone keyboard.

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.

Visit this group at https://groups.google.com/group/capnproto.

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.

Visit this group at https://groups.google.com/group/capnproto.

--
Daniel Sank

Kenton Varda

unread,

Jul 20, 2017, 12:19:35 AM7/20/17

to Ross Light, Thomas Leonard, Cap'n Proto

On Wed, Jul 19, 2017 at 2:57 PM, Ross Light <ro...@zombiezen.com> wrote:

Replies inline (with the disclaimer that I'm not Kenton, my only credentials are that I have stared at this file for a long time):

On Wed, Jul 19, 2017 at 1:46 PM Thomas Leonard <tal...@gmail.com> wrote:
Hi,

I'm trying to write an implementation of the RPC spec (level 1, in OCaml). I found a few parts of the spec unclear - could someone clarify them for me?

It says:

[ExportId]
> The exporter chooses an ID before sending a capability over the wire. If
> the capability is already in the table, the exporter should reuse the same ID.

But later:

[CapDescriptor]
> senderHosted @1 :ExportId;
> A capability newly exported by the sender. This is the ID of the new capability in the
> sender's export table (receiver's import table).

How can the exporter reuse the same ID, if it has to be newly exported?

That seems like a doc/spec typo. You can always specify an existing capability. I think the wording should be something like: "A capability exported by the sender. This may or may not be a new ID in the sender's export table (receiver's import table)."

Correct.

[Message]
> This could be e.g. because the sender received an invalid or nonsensical
> message (`isCallersFault` is true) or because the sender had an internal error
> (`isCallersFault` is false).

isCallersFault appears to be deprecated (`obsoleteIsCallersFault` appears much later).

Yup, Exception has changed (IMO for the better). Instead of placing blame on sender or receiver (such distinctions are hard to draw in general), exceptions are now about what action that caller is advised to take based on the failure.

Correct.

[Call.sendResultsTo]
> When `yourself` is used, the receiver must still send a `Return` for the call, but sets the
> field `resultsSentElsewhere` in that `Return` rather than including the results.

When should `resultsSentElsewhere` be returned? Once the result is known? Or
once the first takeFromOtherQuestion collects it?

(I haven't implemented this for Go yet, but want to.) AFAICT resultsSentElsewhere should be sent once the result is known.

I think the answer here is "it doesn't really matter".

When Alice calls Bob.foo(), and Bob tail-calls back to Alice.bar(), Bob sends the Call to bar() with "send to yourself" *immediately* followed by the Return for foo() with "take from other question". Eventually Alice sends a Return for bar(), but Bob doesn't really do anything with this Return, so it actually doesn't matter when it is sent. That said, the C++ implementation appears to wait for bar() to finish before sending the Return.

Can takeFromOtherQuestion be used more than once for a single source question?

I would assume that it could be used until Finish message is sent for that question, much like other question-based data. In practice, every call's result is held in the answers table until Finish is received.

No, it can only be used once.

For languages without garbage collection, it would be annoying for the protocol to specify that some messages can potentially be shared.

> The `Call` for bar'() has `sendResultsTo` set to `yourself`, with the value being the
> question ID originally assigned to the bar() call.

What does "the value" refer to here? `yourself` has type `Void`.

> Vat B receives the `Return` for bar'() and sends a `Return` for bar(), with
> `receivedFromYourself` set in place of the results.

`receivedFromYourself` does not appear anywhere else in the spec.

I think this whole example is stale and probably needs another draft.

Yeah. I must have had an earlier version where the child call specified its parent, rather than the parent return specifying the child.

[Return.releaseParamCaps]
> If true, all capabilities that were in the params should be considered released.

Just to be sure: as if the sender had sent a release message for each one with `count=1`?

(I might be wrong on this point, it's been a while since I've looked. The docs should probably spell this out.) Usually. The list of CapDescriptors in a Payload could point to the same capability multiple times. A release message of count=1 per CapDescriptor is a more accurate way of phrasing this.

Correct.

[Payload]
Why is it not possible to send exceptions in payloads? Should I export each
broken capability as an export and then immediately send a Resolve for each
one, resolving it to an exception?

Payload is only used for parameters and results. It doesn't make sense for parameters to be an exception, and results is inside a union where you could specify an exception that is an alternative. I'm not sure I understand the use-case where you are sending a broken capability.

Correct that Payload isn't relevant to resolving capabilities.

To the original question: Hmm, I guess it would have made sense for CapDescriptor to have an additional variant for an already-broken capability, to avoid the need for an extra Resolve.

However, yes, in practice, the C++ implementation will introduce a promise-capability and then immediately send a Resolve resolving it to an exception.

[Resolve]
> When an export ID sent over the wire (e.g. in a `CapDescriptor`) is indicated to be a promise,
> this indicates that the sender will follow up at some point with a `Resolve` message. If the
> same `promiseId` is sent again before `Resolve`, still only one `Resolve` is sent. If the
> same ID is sent again later _after_ a `Resolve`, it can only be because the export's
> reference count hit zero in the meantime and the ID was re-assigned to a new export, therefore
> this later promise does _not_ correspond to the earlier `Resolve`.

It's not clear to me why it is useful for the receiver to know this.
Presumably the sender can't reuse an export ID until the receiver explicitly releases it anyway.
Should an implementation keep track of whether a resolve has arrived yet and behave differently based on this when it sees an export ID?

It's more specifying that the receiver should not resolve the promise more than once. I believe in this case that it would be a protocol violation, in which case the correct behavior would be for the receiver to send an abort.

The text here is me trying to prove that it's safe for the protocol to specify that only one Resolve message is sent no matter how many times the export ID was introduced. I'm trying to show that there's no race conditions caused by messages travelling in opposite directions passing each other in-flight.

Specifically, the rule I'm stating is: After you send a Resolve message for a promise, you *cannot* attempt to reference the same promise again in a subsequent message. The associated export ID is off-limits until it has been released. Once released, it can be reused as normal. Put another way, after a Resolve, the export ID's refcount is only allowed to decrease until it hits zero, and then it can increase again.

The reason for this is that if you referenced an already-resolved promise, it's possible that the other end has already released the import and sent a Release message before it receives the new reference. In that case, when it receives the new reference, it will mistakenly believe that it's hearing about an all-new promise, and will expect a new Resolve message, which will never come.

Luckily, there's no need to reference a promise again once it has been resolved. It always makes sense to reference the object it resolved to instead. However, to get this right may require some bookkeeping in the implementation.

(If not for this rule, then I would have had to make a different rule instead: I would have had to say that if you reference a promise again after resolving it, then you need to send another Resolve message. But that would have just been wasteful.)

> The sender promises that from this point forth, until `promiseId` is released, it shall
> simply forward all messages to the capability designated by `cap`.

Does something similar apply to Return messages? Might be worth mentioning it there too.

I believe so, but I don't know/remember. :(

The comment here is referring to a rule that resolves the Tribble 4-way Race Condition, which relates to embargoes, which I'll discuss below.

The rule says that once you've declared that promise P resolves to capability C, then any future message address to promise P shall be forwarded to capability C -- *even if* capability C itself turns out to be a promise which resolves to D. Even after the resolution, messages to P cannot be forwarded directly to D -- they *must* be forwarded to C (which will then forward to D).

A similar requirement applies to returns, yes.

[Disembargo]
> Embargos are used to enforce E-order in the presence of promise resolution. That is, if an
> application makes two calls foo() and bar() on the same capability reference, in that order,
> the calls should be delivered in the order in which they were made. But if foo() is called
> on a promise, and that promise happens to resolve before bar() is called, then the two calls
> may travel different paths over the network, and thus could arrive in the wrong order. In
> this case, the call to `bar()` must be embargoed, and a `Disembargo` message must be sent along
> the same path as `foo()` to ensure that the `Disembargo` arrives after `foo()`.

What does "this case" refer to? When exactly is an embargo needed, and when not?

If you're implementing level 1 (two-party), then really the only place where this applies is when you receive a capability that the receiver hosts as part of a return or resolve after you have made calls on the promised capability. This implies that the RPC system needs to keep track of which parts of the answer have had calls made on them. When this occurs, the receiver gives the application code an embargoed client, and then sends a Disembargo with senderLoopback set. It releases the embargo once the same disembargo ID is returned with receiverLoopback set.

For me, this was the hardest part of the spec to understand. I understand why it's needed, but it's really hard to grok the implications.

Example:

1. Alice -- in Vat A -- holds a capability C which is a promise pointing towards Vat B.

2. Alice calls foo() on C. This call is sent to Vat B.

3. Vat B informs Vat A that C has resolved, and it points to Carol, who also lives in Vat A. Thus, future calls can be made directly in-process. (However, the call to foo() is still in-flight.)

4. Alice calls bar() on C. Since the promise has resolved, this call can be delivered locally directly to Carol.

5. Vat B reflects the call to foo() back to Carol in Vat A.

In a naive implementation, the bar() call will arrive at Carol before the foo() call does, which is wrong.

We need to introduce embargoes to prevent this:

1. Alice -- in Vat A -- holds a capability C which is a promise pointing towards Vat B.

2. Alice calls foo() on C. This call is sent to Vat B.

3. Vat B informs Vat A that C has resolved, and it points to Carol, who also lives in Vat A. Thus, future calls can be made directly in-process. (However, the call to foo() is still in-flight.)

3.1 Vat A marks C as pointing to Carol, but embargoed.

3.2 Vat A sends a Disembargo message towards Vat B, addressed to the original promise. (It then releases the promise.)

4. Alice calls bar() on C. Since the promise has resolved, this call can be delivered locally directly to Carol.

4.1 Because C is marked embargoed, the message is held in a queue, not delivered yet.

5. Vat B reflects the call to foo() back to Carol in Vat A.

5.1 Vat A delivers foo() to Carol.

5.2 Vat B reflects the Disembargo back to Carol in Vat A.

5.3 Vat A processes the Disembargo, which releases the embargo on the capability C.

5.4 The bar() call, which was previously embargoed, is now delivered to Carol.

> There are two particular cases where embargos are important. Consider object Alice, in Vat A,
> who holds a promise P, pointing towards Vat B, that eventually resolves to Carol.

Could Carol be another promise here? Should Alice wait until the target is fully resolved before doing a disembargo, or do a disembargo for each step?

See above explanation. But no, Carol cannot be a promise, since the only time that an embargo is triggered is once you get back a locally hosted capability.

Actually Carol could be a promise -- a promise currently hosted in Vat A. (The original promise, which resolved, was hosted in Vat B, but it resolved to another promise, back in Vat A.)

This situation leads to the Tribble 4-way Race Condition, which is described in the Disembargo doc comment.

The short version is that once Vat B declares to Vat A that the promise has resolved to Carol, then Vat B must always forward all messages addressed to the promise directly to Carol. Even if Vat A later declares that Carol has further resolved to Dave, Vat B cannot start forwarding messages directly to Dave -- it must keep sending them to Carol.

[Accept]
> This message is also used to pick up a redirected return -- see `Return.redirect`.

`redirect` doesn't appear anyway else in this spec. I guess it's `Return.sendResultsTo.thirdParty`.

Probably. It's Level 3, so it's invisible to me. :D

Correct.

[ Network-specific Parameters]
> For interaction over the global internet between parties with no other prior arrangement, a
> particular set of bindings for these types is defined elsewhere. (TODO(someday): Specify where
> these common definitions live.)

Do these definitions exist now?

¯\_(ツ)_/¯

They do not. :(

I agree the docs should be updated here. It'd be great if someone who is not me wanted to make the changes since other people can probably better-identify which bits are confusing than I can... :)

-Kenton

Ross Light

unread,

Jul 20, 2017, 12:54:17 AM7/20/17

to Kenton Varda, Thomas Leonard, Cap'n Proto

I sent #522 to fix the parts we identified as wrong.

So while most of the implications in here can be inferred from the spec, I can understand why they are not part of the spec. What might be good is to have a non-normative comment section that's an "Implementer's Guide" that spells out some of these pitfalls. The guidance from this old thread is definitely not obvious, but is implied by other parts of the spec. I'm struggling with some of those implications now, but I'll start a separate thread.

Daniel Sank

unread,

Jul 20, 2017, 1:08:23 AM7/20/17

to Ross Light, Thomas Leonard, Kenton Varda, Cap'n Proto

+1 for non-normative documentation.

--

You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+unsubscribe@googlegroups.com.

Ian Denhardt

unread,

Jul 20, 2017, 1:31:18 AM7/20/17

to 'Kenton Varda' via Cap'n Proto, Kenton Varda, Ross Light, Thomas Leonard, Cap'n Proto

Quoting 'Kenton Varda' via Cap'n Proto (2017-07-20 00:18:53)

> [Message]
> > This could be e.g.� because the sender received an invalid or
> nonsensical
> > message (`isCallersFault` is true) or because the sender had an
> internal error
> > (`isCallersFault` is false).
> isCallersFault appears to be deprecated (`obsoleteIsCallersFault`
> appears much later).
>
> Yup, Exception has changed (IMO for the better).� Instead of placing
> blame on sender or receiver (such distinctions are hard to draw in
> general), exceptions are now about what action that caller is advised
> to take based on the failure.
>
> Correct.

Yep, and FWIW, this is fixed on master -- the parentheticals have been
removed.

signature.asc

Thomas Leonard

unread,

Jul 20, 2017, 7:02:53 AM7/20/17

to Kenton Varda, Ross Light, Cap'n Proto

On 20 July 2017 at 05:18, Kenton Varda <ken...@cloudflare.com> wrote:
> On Wed, Jul 19, 2017 at 2:57 PM, Ross Light <ro...@zombiezen.com> wrote:
>> On Wed, Jul 19, 2017 at 1:46 PM Thomas Leonard <tal...@gmail.com> wrote:

[ thanks for the replies - snipping the bits I now understand ]

>>> Can takeFromOtherQuestion be used more than once for a single source
>>> question?
>>
>> I would assume that it could be used until Finish message is sent for that
>> question, much like other question-based data. In practice, every call's
>> result is held in the answers table until Finish is received.
>
> No, it can only be used once.
>
> For languages without garbage collection, it would be annoying for the
> protocol to specify that some messages can potentially be shared.

I thought that must be the reason originally, but it seems that
takeFromOtherQuestion requires sharing even if it can only be used
once, because the struct is held by the original answer (for
pipelining) and also by the question that took it.

>>> [Return.releaseParamCaps]
>>> > If true, all capabilities that were in the params should be considered
>>> > released.
>>>
>>> Just to be sure: as if the sender had sent a release message for each one
>>> with `count=1`?
>>
>> (I might be wrong on this point, it's been a while since I've looked. The
>> docs should probably spell this out.) Usually. The list of CapDescriptors
>> in a Payload could point to the same capability multiple times. A release
>> message of count=1 per CapDescriptor is a more accurate way of phrasing
>> this.

Good point (and that is what my implementation does).

Maybe I got this bit wrong. I attached the "used" flags to the
question, but maybe I should be tagging the reference to the question
instead. Can different references to the same question need different
disembargoes? e.g. should forwarding a message mark the promised
answer as needing a disembargo or not?

> Example:

That example is straight-forward, but there are more complex cases
that are unclear to me. Here's one I'm not sure about:

There are two vats, Client and Server, each of which starts with a
reference to the other's bootstrap service. All calls either return a
single capability (field-name `x`) or Void.

1. Client makes a call, q1, on the server's bootstrap object, getting
a promise a=q1.x
2. Client makes another call, q2, on the same target, getting promise b=q2.x.
3. Server asks one question, q3 (c=q3.x)
4. Client responds to q3 with a (the unresolved promised cap from its q1)
5. Server responds to q1 with client_bs (the client's bootstrap
service, resolved)
6. Server responds to q2 with c (q3.x, still unresolved)
7. Client makes call m1 on b (sent to q2)
8. Client receives response a=client_bs (no embargo needed)
9. Client receives response b=q3.x, which is a.
This was q1.x at the time q3 returned, but client_bs now. Which
should it use?
If client_bs, it embargoes the target due to m1.
If not, b now points at the returned q1, which seems odd.
10. Client makes call m2 on b (which is then held at the embargo).
11. Server receives response that c=q1.x (which is client_bs).
12. Server receives m1 and forwards it to q3.x.
13. Server sends disembargo response back to client.
14. Client receives m1 and forwards it to q1 (the resolution it gave for q3).
15. Client disembargoes b and sends m2 to client_bs.

(sorry if this is unclear - I've put up a diagram of it here:
https://github.com/mirage/capnp-rpc/issues/59)

With b set to the embargoed client_bs in step 9, m2 arrives before m1
(which is now on its way back to the server again). Perhaps a second
disembargo is needed?

Alternatively, if b was set to point at q1.x, I think the messages
will arrive in the correct order.
But then b ends up pointing to the answered q1, rather than directly
at client_bs.

As a third alternative, if messages are sent to their current, more
resolved destination (ignoring the spec) then this case works, but can
be made to break by sending another message down q3 before m1 arrives
at c. Then the server embargoes m1, and m2 arrives first again.

Could someone point me in the right direction?

Thanks,

--
talex5 (GitHub/Twitter) http://roscidus.com/blog/
GPG: 5DD5 8D70 899C 454A 966D 6A51 7513 3C8F 94F6 E0CC

Kenton Varda

unread,

Jul 20, 2017, 2:39:11 PM7/20/17

to Thomas Leonard, Ross Light, Cap'n Proto

On Thu, Jul 20, 2017 at 4:02 AM, Thomas Leonard <tal...@gmail.com> wrote:

I thought that must be the reason originally, but it seems that
takeFromOtherQuestion requires sharing even if it can only be used
once, because the struct is held by the original answer (for
pipelining) and also by the question that took it.

True, but this is a restricted case, and may still allow the implementation more freedom than general sharing would. For example, for pipelining purposes, technically the implementation only needs to keep the capabilities around, along with remembering their pointer paths. It doesn't otherwise need to remember the content of the response.

>> If you're implementing level 1 (two-party), then really the only place
>> where this applies is when you receive a capability that the receiver hosts
>> as part of a return or resolve after you have made calls on the promised
>> capability. This implies that the RPC system needs to keep track of which
>> parts of the answer have had calls made on them. When this occurs, the
>> receiver gives the application code an embargoed client, and then sends a
>> Disembargo with senderLoopback set. It releases the embargo once the same
>> disembargo ID is returned with receiverLoopback set.

Maybe I got this bit wrong. I attached the "used" flags to the
question, but maybe I should be tagging the reference to the question
instead. Can different references to the same question need different
disembargoes? e.g. should forwarding a message mark the promised
answer as needing a disembargo or not?

Sorry, I don't understand your question here.

Nice example!

It looks like the C++ implementation today will decide b = q1.x, and never allow it to further resolve to client_bs. This "works" but is clearly suboptimal.

For a correct solution, we need to recognize that Disembargo messages can "bounce" multiple times:

The disembargo sent in step 9 has a final destination of client_bs.

In order to get there, it has to bounce back and forth between the client and server twice:

* The client sends it towards q2.x.

* The server, recognizing that it resolved q2.x to q3.x, reflects the embargo towards q3.x.

* The client, recognizing that it resolved q3.x to q1.x, reflects back to the server again.

* The server, recognizing that q1.x resolved to client_bs, finally reflects back to client_bs.

This gives m1 enough time to arrive before the disembargo.

It looks like this is not implemented correctly in C++ currently. It appears the C++ implementation ignores disembargo.messageTarget in the case that the Disembargo has type `receiverLoopback`. This is incorrect -- it needs to verify that the embargo has reached its final destination, not an intermediate promise. (However, it is "saved" by the suboptimal behavior mentioned above.)

On another note, you say you found this with AFL, which is amazing. Could your fuzzing strategy be applied to the C++ implementation as well?

-Kenton

Thomas Leonard

unread,

Jul 20, 2017, 4:05:23 PM7/20/17

to Kenton Varda, Ross Light, Cap'n Proto

On 20 July 2017 at 19:38, Kenton Varda <ken...@cloudflare.com> wrote:
> On Thu, Jul 20, 2017 at 4:02 AM, Thomas Leonard <tal...@gmail.com> wrote:
>>
>> I thought that must be the reason originally, but it seems that
>> takeFromOtherQuestion requires sharing even if it can only be used
>> once, because the struct is held by the original answer (for
>> pipelining) and also by the question that took it.
>
>
> True, but this is a restricted case, and may still allow the implementation
> more freedom than general sharing would. For example, for pipelining
> purposes, technically the implementation only needs to keep the capabilities
> around, along with remembering their pointer paths. It doesn't otherwise
> need to remember the content of the response.

I did start off trying to implement it that way, but then I realised
that questions don't usually hang around long anyway, so it didn't
seem worth the effort.

>> >> If you're implementing level 1 (two-party), then really the only place
>> >> where this applies is when you receive a capability that the receiver
>> >> hosts
>> >> as part of a return or resolve after you have made calls on the
>> >> promised
>> >> capability. This implies that the RPC system needs to keep track of
>> >> which
>> >> parts of the answer have had calls made on them. When this occurs, the
>> >> receiver gives the application code an embargoed client, and then sends
>> >> a
>> >> Disembargo with senderLoopback set. It releases the embargo once the
>> >> same
>> >> disembargo ID is returned with receiverLoopback set.
>>
>> Maybe I got this bit wrong. I attached the "used" flags to the
>> question, but maybe I should be tagging the reference to the question
>> instead. Can different references to the same question need different
>> disembargoes? e.g. should forwarding a message mark the promised
>> answer as needing a disembargo or not?
>
> Sorry, I don't understand your question here.

Maybe it doesn't make sense, or only with my implementation, but it
seems we have two objects for a question/export:

- a proxy that always sends to the remote peer
- a switchable proxy that forwards to the previous object until the
question returns, and then sends to the new target (possibly after a
disembargo)

I was just wondering which proxy should track whether it has been used
(and, therefore, whether it needs a disembargo).

If we had the implementor's guide that was mentioned earlier, it could
probably cover this. My current implementation muddles these two up,
which is why it's delivering things out of order, so my question is
probably muddled up too. I'll need to think about this a bit more.

Does it alternate between being a disembargo request and a disembargo
response as this happens?
Does the 3-vat case complicate things?

> The disembargo sent in step 9 has a final destination of client_bs.
>
> In order to get there, it has to bounce back and forth between the client
> and server twice:
> * The client sends it towards q2.x.
> * The server, recognizing that it resolved q2.x to q3.x, reflects the
> embargo towards q3.x.
> * The client, recognizing that it resolved q3.x to q1.x, reflects back to
> the server again.
> * The server, recognizing that q1.x resolved to client_bs, finally reflects
> back to client_bs.
>
> This gives m1 enough time to arrive before the disembargo.
>
> It looks like this is not implemented correctly in C++ currently. It appears
> the C++ implementation ignores disembargo.messageTarget in the case that the
> Disembargo has type `receiverLoopback`. This is incorrect -- it needs to
> verify that the embargo has reached its final destination, not an
> intermediate promise. (However, it is "saved" by the suboptimal behavior
> mentioned above.)

OK, I'll try to match the C++ behaviour for now.

> On another note, you say you found this with AFL, which is amazing. Could
> your fuzzing strategy be applied to the C++ implementation as well?

Maybe. Here's how it works:

To simplify things, my OCaml capnp-rpc library is in two parts. One
provides the RPC logic over abstract message types, and the other
provides an implementation using the Cap'n Proto serialisation for the
messages. Most of the unit-tests check the core logic directly, using
a simpler message type where a payload is just a test string and an
array of capability pointers. The fuzz tests use a mutable struct with
things useful for checking for violations.

The fuzz tests set up some vats (two or three) in a single process and
then have them perform operations based on input from the fuzzer.
Each step selects one vat and performs a random (fuzzer-chosen)
operation out of:

1. Request a bootstrap capability from a random peer.
2. Handle one message on the incoming queue.
3. Call a random capability, passing randomly-selected capabilities as
arguments.
4. Finish a random question.
5. Release a random capability.
6. Add a capability to a newly-created local service.
7. Answer a random question, passing random-selected capabilities as
the response.

When it runs out of input data from the fuzzer it releases all
capabilities, answers all questions and allows the system to become
idle.

The fuzz tests include in the call's payload contents a sequence
number and a (mutable) struct containing the source reference's
counters:

type cap_ref_counters = {
mutable next_to_send : int;
mutable next_expected : int;
}

When the message arrives, the target service checks that the counter
in the content matches the current value of `next_expected` and
increments it.
So, it should always detect if messages arrive out of order.

Another way it takes advantage of everything running in one process is
that it maintains a second reference graph, but one which doesn't use
CapTP. When it requests a bootstrap capability over CapTP, it also
returns a direct pointer to the target service. So, it's a copy of the
reference graph but with all vat-spanning links replaced with direct
pointers. Then, it checks that every message is delivered to the
service it would have been delivered to if there were no network in
the way.

I leave AFL running my binary for a while with the --fuzz option
(which disables logging to keep things fast).
When it finds a violation it leaves it in the crash directory. Then I
run the fuzz binary on it manually without --fuzz, which turns on
logging and runs a load of sanity checks at each step, as well as
dumping the state of the system at each step. It also outputs an OCaml
unit-test, which can be cut-and-pasted into the test-suite. The
unit-tests look like this, after being cleaned up a bit:

https://github.com/mirage/capnp-rpc/blob/f5a32455c41056eaa40b3074f1cfb33741854e69/test/test.ml#L462

If the test-case is too long, afl-tmin can often shorten it.

There are plenty of other things it could be made to check, e.g. that
forked references can't access anything before their parent, that
messages to valid targets are always eventually delivered, that after
letting the system become idle all references point directly to the
vat containing their target in the direct reference graph, that
malicious vats can't cause protocol violations in connections between
good vats, etc. However, I have enough bugs to fix with the current
tests ;-)

I don't know if that's any use to you - I'm not sure how the C++ code
is structured.

Kenton Varda

unread,

Jul 20, 2017, 4:39:44 PM7/20/17

to Thomas Leonard, Ross Light, Cap'n Proto

Will respond in more detail later, but:

On Thu, Jul 20, 2017 at 1:05 PM, Thomas Leonard <tal...@gmail.com> wrote:

OK, I'll try to match the C++ behaviour for now.

I don't think there's any need to. The difference in behavior is entirely on the side that initiates the embargoes, and is only to protect invariants on that end. So you don't need cooperation from the other end to implement correct behavior now.

-Kenton

Kenton Varda

unread,

Jul 20, 2017, 11:07:52 PM7/20/17

to Thomas Leonard, Ross Light, Cap'n Proto

On Thu, Jul 20, 2017 at 1:05 PM, Thomas Leonard <tal...@gmail.com> wrote:

I did start off trying to implement it that way, but then I realised
that questions don't usually hang around long anyway, so it didn't
seem worth the effort.

Yes, that's probably the right decision.

>> Maybe I got this bit wrong. I attached the "used" flags to the
>> question, but maybe I should be tagging the reference to the question
>> instead. Can different references to the same question need different
>> disembargoes? e.g. should forwarding a message mark the promised
>> answer as needing a disembargo or not?
>
> Sorry, I don't understand your question here.

Maybe it doesn't make sense, or only with my implementation, but it
seems we have two objects for a question/export:

- a proxy that always sends to the remote peer
- a switchable proxy that forwards to the previous object until the
question returns, and then sends to the new target (possibly after a
disembargo)

I was just wondering which proxy should track whether it has been used
(and, therefore, whether it needs a disembargo).

If we had the implementor's guide that was mentioned earlier, it could
probably cover this. My current implementation muddles these two up,
which is why it's delivering things out of order, so my question is
probably muddled up too. I'll need to think about this a bit more.

OK, makes sense.

> Nice example!
>
> It looks like the C++ implementation today will decide b = q1.x, and never
> allow it to further resolve to client_bs. This "works" but is clearly
> suboptimal.
>
> For a correct solution, we need to recognize that Disembargo messages can
> "bounce" multiple times:

Does it alternate between being a disembargo request and a disembargo
response as this happens?

It alternates between `senderLoopback` and `receiverLoopback`, yes.

Does the 3-vat case complicate things?

Always. :) (But I haven't thought it through lately...)

-Kenton

Reply all

Reply to author

Forward

0 new messages